Search CORE

1,781 research outputs found

Multilinguals and Wikipedia Editing

Author: Hale Scott A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across different language editions of the encyclopedia, and prior work has suggested this could reduce the level of self-focus bias in each edition. This study finds multilingual users are much more active than their single-edition (monolingual) counterparts. They are found in all language editions, but smaller-sized editions with fewer users have a higher percentage of multilingual users than larger-sized editions. About a quarter of multilingual users always edit the same articles in multiple languages, while just over 40% of multilingual users edit different articles in different languages. When non-English users do edit a second language edition, that edition is most frequently English. Nonetheless, several regional and linguistic cross-editing patterns are also present

arXiv.org e-Print Archive

CiteSeerX

Crossref

Oxford University Research Archive

Modeling the Rise in Internet-based Petitions

Author: Hale Scott A.
Margetts Helen
Yasseri Taha
Publication venue
Publication date: 14/08/2014
Field of study

Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition signatories for a given platform. This paper tracks the growth curves of all 20,000 petitions to the UK government over 18 months, analyzing the rate of growth and outreach mechanism. Previous research has suggested the importance of the first day to the ultimate success of a petition, but has not examined early growth within that day, made possible here through hourly resolution in the data. The analysis shows that the vast majority of petitions do not achieve any measure of success; over 99 percent fail to get the 10,000 signatures required for an official response and only 0.1 percent attain the 100,000 required for a parliamentary debate. We analyze the data through a multiplicative process model framework to explain the heterogeneous growth of signatures at the population level. We define and measure an average outreach factor for petitions and show that it decays very fast (reducing to 0.1% after 10 hours). After 24 hours, a petition's fate is virtually set. The findings seem to challenge conventional analyses of collective action from economics and political science, where the production function has been assumed to follow an S-shaped curve.Comment: Submitted to EPJ Data Scienc

arXiv.org e-Print Archive

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate

Author: Botelho A
Hale Scott
Vidgen B
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2021
Field of study

Accurate detection and classification of online hate is a difficult task. Implicit hate is particularly challenging as such content tends to have unusual syntax, polysemic words, and fewer markers of prejudice (e.g., slurs). This problem is heightened with multimodal content, such as memes (combinations of text and images), as they are often harder to decipher than unimodal content (e.g., text alone). This paper evaluates the role of semantic and multimodal context for detecting implicit and explicit hate. We show that both text- and visual- enrichment improves model performance, with the multimodal model (0.771) outperforming other models' F1 scores (0.544, 0.737, and 0.754). While the unimodal-text context-aware (transformer) model was the most accurate on the subtask of implicit hate detection, the multimodal model outperformed it overall because of a lower propensity towards false positives. We find that all models perform better on content with full annotator agreement and that multimodal models are best at classifying the content where annotators disagree. To conduct these investigations, we undertook high-quality annotation of a sample of 5,000 multimodal entries. Tweets were annotated for primary category, modality, and strategy. We make this corpus, along with the codebook, code, and final model, freely available

arXiv.org e-Print Archive

Oxford University Research Archive

Does Campaigning on Social Media Make a Difference? Evidence from candidate use of Twitter during the 2015 and 2017 UK Elections

Author: Bright Jonathan
Bulovsky Andrew
Ganesh Bharath
Hale Scott A
Howard Phil
Margetts Helen
Publication venue
Publication date: 27/07/2018
Field of study

Social media are now a routine part of political campaigns all over the world. However, studies of the impact of campaigning on social platform have thus far been limited to cross-sectional datasets from one election period which are vulnerable to unobserved variable bias. Hence empirical evidence on the effectiveness of political social media activity is thin. We address this deficit by analysing a novel panel dataset of political Twitter activity in the 2015 and 2017 elections in the United Kingdom. We find that Twitter based campaigning does seem to help win votes, a finding which is consistent across a variety of different model specifications including a first difference regression. The impact of Twitter use is small in absolute terms, though comparable with that of campaign spending. Our data also support the idea that effects are mediated through other communication channels, hence challenging the relevance of engaging in an interactive fashion

arXiv.org e-Print Archive

Oxford University Research Archive

Mapping the UK Webspace: Fifteen Years of British Universities on the Web

Author: Cowls Josh
Hale Scott A.
Margetts Helen
Meyer Eric T.
Schroeder Ralph
Yasseri Taha
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time. We then explore changes over time within a particular second-level domain, the academic subdomain .ac.uk, and compare linking practices with variables, including institutional affiliation, league table ranking, and geographic location. We do not detect institutional affiliation affecting linking practices and find only partial evidence of league table ranking affecting network centrality, but find a clear inverse relationship between the density of links and the geographical distance between universities. This echoes prior findings regarding offline academic activity, which allows us to argue that real-world factors like geography continue to shape academic relationships even in the Internet age. We conclude with directions for future uses of web archive resources in this emerging area of research.Comment: To appear in the proceeding of WebSci 201

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Petition Growth and Success Rates on the UK No. 10 Downing Street Website

Author: Hale Scott A.
Margetts Helen
Yasseri Taha
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Now that so much of collective action takes place online, web-generated data can further understanding of the mechanics of Internet-based mobilisation. This trace data offers social science researchers the potential for new forms of analysis, using real-time transactional data based on entire populations, rather than sample-based surveys of what people think they did or might do. This paper uses a `big data' approach to track the growth of over 8,000 petitions to the UK Government on the No. 10 Downing Street website for two years, analysing the rate of growth per day and testing the hypothesis that the distribution of daily change will be leptokurtic (rather than normal) as previous research on agenda setting would suggest. This hypothesis is confirmed, suggesting that Internet-based mobilisation is characterized by tipping points (or punctuated equilibria) and explaining some of the volatility in online collective action. We find also that most successful petitions grow quickly and that the number of signatures a petition receives on its first day is a significant factor in explaining the overall number of signatures a petition receives during its lifetime. These findings have implications for the strategies of those initiating petitions and the design of web sites with the aim of maximising citizen engagement with policy issues.Comment: To appear in proceeding of WebSci'13, May 1-5, 2013, Paris, Franc

arXiv.org e-Print Archive

CiteSeerX

Crossref

Oxford University Research Archive

Lost in Translation -- Multilingual Misinformation and its Evolution

Author: Bovet Alexandre
Cheng Calvin
Hale Scott A.
Quelle Dorian
Publication venue
Publication date: 27/10/2023
Field of study

Misinformation and disinformation are growing threats in the digital age, spreading rapidly across languages and borders. This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages. First, we find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times. Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries, suggesting that some misinformation permeates language barriers. However, spreading patterns exhibit strong homophily, with misinformation more likely to spread within the same language. To study the evolution of claims over time and mutations across languages, we represent fact-checks with multilingual sentence embeddings and cluster semantically similar claims. We analyze the connected components and shortest paths connecting different versions of a claim finding that claims gradually drift over time and undergo greater alteration when traversing languages. Overall, this novel investigation of multilingual misinformation provides key insights. It quantifies redundant fact-checking efforts, establishes that some claims diffuse across languages, measures linguistic homophily, and models the temporal and cross-lingual evolution of claims. The findings advocate for expanded information sharing between fact-checkers globally while underscoring the importance of localized verification

arXiv.org e-Print Archive